Introduction to Statistics
Bennett Kleinberg
Week 10
Week 10
Statistical power
- Part 1: What is statistical power?
- Part 2: How do we calculate statistical power?
Part 1: What is statistical power?
Back to week 4
Two kinds of errors: Type 1 errors and Type 2 errors
Type 1 errors
Analogy: false positives
We conclude there is a difference (an effect), but it’s a false alarm (in reality there is no effect).
In hypothesis terms: we reject the null but shouldn’t have done so.
Type 1 errors
We want to keep that error low.
i.e. we want to be quite sure that there is an effect.
This is all contained in the alpha level: under the null, a proportion of exactly \(\alpha\) lies in the critical region.
For \(\alpha=0.01\), 1% of the values under the null lie is that area.
Thus: in 1% of the cases, will we incorrectly conclude that there is an effect.
Today: Type 2 errors
Analogy: missed effects.
We conclude that there is no difference, but in reality there is one (i.e. we miss the effect).
In hypothesis terms: we fail to reject the null hypothesis although we should have done so.
This error term is called \(\beta\).
Inference errors
- Type I errors: we keep these low by setting \(\alpha\) low
- Type II errors: we want these low as well!
But there’s no free lunch in statistics!
Statistical power
- the Type II error is the failure to reject the null hypothesis if we should have done so
- the probability of this error is called \(\beta\)
The statistical power of a test is \(1-\beta\).
Power and \(\beta\)

Statistical power
Another way of understanding statistical power:
Statistical power is the the probability that a (hypothesis) test will correctly reject \(H_0\)
Graphical explanation
- suppose we test the IQ score:
- the IQ scores are distributed normally with \(\mu = 100\) and \(\sigma = 15\)
- we now give a sample of \(n=20\) 3 cups of espresso before they take the IQ test
- suppose the espresso trick is pure magic: it leads to a full shift in +0.50 SD (7.5 points)
\(H_0: \mu= 100\)
\(H_0\) distribution

Espresso trick distribution

Both together

Stepwise
- define alpha as \(\alpha = .05\)
- one-sided critical z-value: \(z=1.65\)
- translates to \(1.65 = \frac{M-100}{\sigma_M} \leftrightarrow 1.65 = \frac{M-100}{3.35} \leftrightarrow M = 105.53\)
So we know that the critical region starts at \(M=105.53\) (for \(n=20\))
\(\alpha\)

Locating the errors
- we can now say that “green area” = critical region where we reject \(H_0\) with \(n=20\)
- i.e. “green” = \(\alpha\)
- so we can also say where \(\beta\) is
…
\(\alpha\) and \(\beta\)

Locating the errors
- we can now say that “green area” = critical region where we reject \(H_0\) with \(n=20\)
- i.e. “green” = \(\alpha\)
- so we can also say where \(\beta\) is:
- \(\beta\) [=“blue”] is the area (probability) where we fail to reject \(H_0\) although we should have!
Bringing it all together
- if we know the probability of \(\alpha\), then we know \(1-\alpha\) under the null
- and if we know \(\beta\), then we know \(1-\beta\)
\(1-\alpha\)

\(1-\beta\)

Bringing it all together
- the “lightblue” area is \(1-\beta\) = statistical power
“So if we want to increase power [=lightblue], why don’t we just make \(\beta\) [=darkblue] smaller?”
The relationship of \(\alpha\) and \(\beta\)
- the boundary of \(\alpha\) for \(H_0\) is also
- the boundary of \(\beta\) for \(H_A\)
Less strict \(\alpha\)

Stricter \(\alpha\)

Always a compromise!
- if we make \(\alpha\) stricter (= decreasing), we increase \(\beta\), so we decrease the statistical power \(1- \beta\)
- if we make \(1-\beta\) higher (= increasing), we decrease \(\beta\), so we increase the Type I error \(\alpha\)
Two solutions
- increasing sample size \(n\)
From \(n=20\) to \(n=40\)

From \(n=20\) to \(n=100\)

Two solutions
- increasing sample size \(n\)
- larger effects
Remember Cohen’s d?
- \(d=\frac{\mu_{treatmemt} - \mu_0}{\sigma} = \frac{107.50 - 100}{15} = 0.5\)
What if we doubled \(d\)?
From \(d=0.5\) to \(d=1.0\)

Factors that matter
- Statistical power increases if we:
- increase \(n\)
- increase the effect size of interest
- increase \(\alpha\)
- Statistical power decreases if we:
- decrease \(n\)
- decrease the effect size of interest
- decrease \(\alpha\)
Part 2: How do we calculate statistical power?
Our example
- IQ scores that are distributed normally with \(\mu = 100\) and \(\sigma = 15\)
- we now give a sample of \(n=20\) 3 cups of espresso before they take the IQ test
- suppose the espresso trick is pure magic: it lead to a full shift in +0.50 SD (7.5 points)
Steps to calculate power
- Critical region under \(H_0\)
- Region in \(H_A\) “beyond” the critical region of \(H_0\)
Critical region
- for \(\alpha = .05\)
- one-sided critical z-value: \(z=1.65\)
- translates to \(1.65 = \frac{M-100}{\sigma_M} \leftrightarrow 1.65 = \frac{M-100}{3.35} \leftrightarrow M = 105.53\)
This is the value under \(H_0\) which demarcates the critical region of “statistical significance”
…
Any \(M > 105.53\) means we reject \(H_0\).
Statistical power is about \(H_A\):
- so we need the probability under \(H_A\) of values that are larger than the critical value of \(H_0\)
Calculating power
[= lightblue]
- probability under \(H_A\) that is larger than the critical value of \(H_0\) (i.e. 105.53)
\(z=\frac{M-\mu}{\sigma_M} = \frac{105.53-107.50}{3.35} = -0.59\)
Thus we know that 105.53 in \(H_A\) corresponds to \(z=-0.59\).
The power is thus the body of the distribution!
Table lookup
- For \(z=-0.59\):
- proportion in tail = 0.2776
- proportion in body = 0.7224
The statistical power here is 0.7224.
We had a 72.24% chance of rejecting \(H_0\) if we needed to.
Another example
- IQ score \(\sim N(100, 15)\)
- Brain food promises an increase of \(d=0.8\)
What is the achieved statistical power for \(n=40\) and \(\alpha=.01\)?
Steps
- Critical value under \(H_0\)?
Needed: tail probability of \(p = .01\) –> \(z=2.32\)
Steps
- Value that corresponds to critical z:
\(2.32 = \frac{M-100}{\sigma_M}\) with
- \(\sigma_M = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{40}} = 2.37\)
So: \(2.32 = \frac{M-100}{2.37} \leftrightarrow M = 105.50\)
Steps
- Obtaining statistical power
- probability under \(H_A\) that is larger than the critical value of \(H_0\) (here: 105.50)
For this we need to know a bit more about \(H_A\)…
Steps
We need the mean of \(H_A\):
- Mean of \(H_A\)
- We know that \(d=0.8 \leftrightarrow 0.8 = \frac{M-100}{15} \leftrightarrow M = 112\)
Cohen’s d of 0.8 translates to an IQ of 112.
Steps
Back to 3 again:
- probability under \(H_A\) that is larger than the critical value of \(H_0\) (here: 105.50)
\(z=\frac{105.50-112}{2.37} = \frac{-6.50}{2.37} = -2.74\)
Exact power
We know that the power is the body proportion (and corresponding probability), so:
Power = .9969
All in one plot

In the live session
- power examples by hand and step-by-step
- additional example on CIs
- formulas clarification
Recap
- the relationship between inference error types (Type I and Type II)
- the relationship between power and sample size, effect size and alpha
- calculating power by hand